Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
An undirected weighted graph (UWG) is frequently adopted to describe the interactions among a solo set of nodes from real applications, such as the user contact frequency from a social network services system. A graph convolutional network (GCN) is widely adopted to perform representation learning to a UWG for subsequent pattern analysis tasks such as clustering or missing data estimation. However, existing GCNs mostly neglects the latent collaborative information hidden in its connected node pairs. To address this issue, this study proposes to model the node collaborations via a symmetric latent factor analysis model, and then regards it as a node-collaboration module for supplementing the collaboration loss in a GCN. Based on this idea, a Node-collaboration-informed Graph Convolutional Network (NGCN) is proposed with three-fold ideas: a) Learning latent collaborative information from the interaction of node pairs via a node-collaboration module; b) Building the residual connection and weighted representation propagation to obtain high representation capacity; and c) Implementing the model optimization in an end-to-end fashion to achieve precise representation to the target UWG. Empirical studies on UWGs emerging from real applications demonstrate that owing to its efficient incorporation of node-collaborations, the proposed NGCN significantly outperforms state-of-the-art GCNs in addressing the task of missing weight estimation. Meanwhile, its good scalability ensures its compatibility with more advanced GCN extensions, which will be further investigated in our future studies.
translated by 谷歌翻译
三维(3D)图像(例如CT,MRI和PET)在医学成像应用中很常见,在临床诊断中很重要。语义歧义是许多医学图像标签的典型特征。这可能是由许多因素引起的,例如成像特性,病理解剖学以及二进制面具的弱表示,这给精确的3D分割带来了挑战。在2D医学图像中,使用软面膜代替图像垫形式产生的二进制掩码来表征病变可以提供丰富的语义信息,更全面地描述病变的结构特征,从而使后续诊断和分析受益。在这项工作中,我们将图像垫子介绍到3D场景中,以描述3D医学图像中的病变。 3D模态中图像垫的研究有限,并且没有与3D矩阵相关的高质量注释数据集,因此减慢了基于数据驱动的深度学习方法的发展。为了解决这个问题,我们构建了第一个3D医疗垫数据集,并通过质量控制和下游实验中的肺结节分类中令人信服地验证了数据集的有效性。然后,我们将四个选定的最新2D图像矩阵算法调整为3D场景,并进一步自定义CT图像的方法。此外,我们提出了第一个端到端的深3D垫网络,并实施了可靠的3D医疗图像垫测试基准,该基准将被发布以鼓励进一步的研究。
translated by 谷歌翻译
大规模图在现实情况下无处不在,可以通过图神经网络(GNN)训练以生成下游任务的表示形式。鉴于大规模图的丰富信息和复杂的拓扑结构,我们认为在这样的图中存在冗余,并将降低训练效率。不幸的是,模型可伸缩性严重限制了通过香草GNNS训练大规模图的效率。尽管在基于抽样的培训方法方面取得了最新进展,但基于抽样的GNN通常忽略了冗余问题。在大规模图上训练这些型号仍然需要无法容忍的时间。因此,我们建议通过重新思考图中的固有特征来降低冗余并提高使用GNN的大规模训练效率。在本文中,我们开拓者提出了一种称为dropreef的曾经使用的方法,以在大规模图中删除冗余。具体而言,我们首先进行初步实验,以探索大规模图中的潜在冗余。接下来,我们提出一个度量标准,以量化图中所有节点的异质性。基于实验和理论分析,我们揭示了大规模图中的冗余,即具有高邻居异质的节点和大量邻居。然后,我们建议Dropreef一劳永逸地检测并删除大规模图中的冗余,以帮助减少训练时间,同时确保模型准确性没有牺牲。为了证明DropReef的有效性,我们将其应用于最新的基于最新的采样GNN,用于训练大规模图,这是由于此类模型的高精度。使用Dropreef杠杆,可以大力提高模型的训练效率。 Dropreef高度兼容,并且在离线上执行,从而在很大程度上使目前和未来的最新采样GNN受益。
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
终身学习旨在学习一系列任务,而无需忘记先前获得的知识。但是,由于隐私或版权原因,涉及的培训数据可能不是终身合法的。例如,在实际情况下,模型所有者可能希望不时启用或禁用特定任务或特定样本的知识。不幸的是,这种灵活的对知识转移的灵活控制在以前的增量或减少学习方法中,即使在问题设定的水平上也被忽略了。在本文中,我们探索了一种新颖的学习方案,称为学习,可回收遗忘(LIRF),该方案明确处理任务或特定于样本的知识去除和恢复。具体而言,LIRF带来了两个创新的方案,即知识存款和撤回,这使用户指定的知识从预先训练的网络中隔离开来,并在必要时将其注入。在知识存款过程中,从目标网络中提取了指定的知识并存储在存款模块中,同时保留了目标网络的不敏感或一般知识,并进一步增强。在知识提取期间,将带走知识添加回目标网络。存款和提取过程仅需在删除数据上对几个时期进行填充时期,从而确保数据和时间效率。我们在几个数据集上进行实验,并证明所提出的LIRF策略具有令人振奋的概括能力。
translated by 谷歌翻译
本文介绍了我们针对CVPR2022通用事件边界字幕(GEBC)竞赛的冠军解决方案。 GEBC要求字幕模型对给定视频边界周围的瞬时状态变化具有理解,这使其比传统的视频字幕任务更具挑战性。在本文中,提出了对视频内容编码和字幕生成的改进的双流变压器:(1)我们利用三个预训练的模型从不同的粒度中提取视频功能。此外,我们利用边界的类型作为提示,以帮助模型生成字幕。 (2)我们特别设计一个称为双流变压器的模型,以学习边界字幕的区分表示。 (3)为了生成与内容相关和类似人类的标题,我们通过设计单词级合奏策略来提高描述质量。 GEBC测试拆分的有希望的结果证明了我们提出的模型的功效。
translated by 谷歌翻译
回归学习是经典的,是医学图像分析的基础。它为许多关键应用程序提供了连续的映射,例如属性估计,对象检测,分割和非刚性注册。但是,先前的研究主要以案例标准(如均方误差)为优化目标。他们忽略了非常重要的人口相关标准,这正是许多任务中的最终评估指标。在这项工作中,我们建议通过有关直接优化细粒相关损失的新型研究来重新审视经典回归任务。我们主要探索两个互补相关索引作为可学习的损失:Pearson线性相关(PLC)和Spearman等级相关性(SRC)。本文的贡献是两个折叠。首先,对于全球层面的PLC,我们提出了一项策略,以使其对异常值进行强大的态度并规范关键分布因素。这些努力显着稳定学习并扩大了PLC的功效。其次,对于本地级别的SRC,我们提出了一种粗到精细的方案,以减轻样品之间确切排名顺序的学习。具体而言,我们将样本排名的学习转换为样本之间相似关系的学习。我们在两个典型的超声图像回归任务上广泛验证了我们的方法,包括图像质量评估和生物措施测量。实验证明,通过直接优化相关性的细粒度指导,回归性能得到显着提高。我们提出的相关性损失是一般的,可以扩展到更重要的应用程序。
translated by 谷歌翻译
神经体系结构搜索方法寻求具有有效的体重共享超级网训练的最佳候选者。但是,最近的研究表明,关于独立架构和共享重量网络之间的性能的排名一致性差。在本文中,我们提出了提前引导的一声NAS(PGONA),以加强超级网的排名相关性。具体而言,我们首先探讨激活功能的效果,并提出基于三明治规则的平衡采样策略,以减轻超级网中的重量耦合。然后,采用了拖鞋和禅宗得分来指导超级网的训练,并具有排名相关性损失。我们的PGONA在CVPR2022第二轻型NAS挑战赛的SuperNet轨道中排名第三。代码可在https://github.com/pprp/cvpr2022-nas?competition-track1-3th-solution中找到。
translated by 谷歌翻译